68 research outputs found
Controlled Text Generation for Black-box Language Models via Score-based Progressive Editor
Despite recent progress in language models, generating constrained text for
specific domains remains a challenge, particularly when utilizing black-box
models that lack domain-specific knowledge. In this paper, we introduce ScoPE
(Score-based Progressive Editor) generation, a novel approach for controlled
text generation for black-box language models. We employ ScoPE to facilitate
text generation in the target domain by integrating it with language models
through a cascading approach. Trained to enhance the target domain score of the
edited text, ScoPE progressively edits intermediate output discrete tokens to
align with the target attributes throughout the auto-regressive generation
process of the language model. This iterative process guides subsequent steps
to produce desired output texts for the target domain. Our experimental results
on diverse controlled generations demonstrate that ScoPE effectively
facilitates controlled text generation for black-box language models in both
in-domain and out-of-domain conditions, which is challenging for existing
methods
Successor-Predecessor Intrinsic Exploration
Exploration is essential in reinforcement learning, particularly in
environments where external rewards are sparse. Here we focus on exploration
with intrinsic rewards, where the agent transiently augments the external
rewards with self-generated intrinsic rewards. Although the study of intrinsic
rewards has a long history, existing methods focus on composing the intrinsic
reward based on measures of future prospects of states, ignoring the
information contained in the retrospective structure of transition sequences.
Here we argue that the agent can utilise retrospective information to generate
explorative behaviour with structure-awareness, facilitating efficient
exploration based on global instead of local information. We propose
Successor-Predecessor Intrinsic Exploration (SPIE), an exploration algorithm
based on a novel intrinsic reward combining prospective and retrospective
information. We show that SPIE yields more efficient and ethologically
plausible exploratory behaviour in environments with sparse rewards and
bottleneck states than competing methods. We also implement SPIE in deep
reinforcement learning agents, and show that the resulting agent achieves
stronger empirical performance than existing methods on sparse-reward Atari
games
Prediction and Generalisation over Directed Actions by Grid Cells
Knowing how the effects of directed actions generalise to new situations
(e.g. moving North, South, East and West, or turning left, right, etc.) is key
to rapid generalisation across new situations. Markovian tasks can be
characterised by a state space and a transition matrix and recent work has
proposed that neural grid codes provide an efficient representation of the
state space, as eigenvectors of a transition matrix reflecting diffusion across
states, that allows efficient prediction of future state distributions. Here we
extend the eigenbasis prediction model, utilising tools from Fourier analysis,
to prediction over arbitrary translation-invariant directed transition
structures (i.e. displacement and diffusion), showing that a single set of
eigenvectors can support predictions over arbitrary directed actions via
action-specific eigenvalues. We show how to define a "sense of direction" to
combine actions to reach a target state (ignoring task-specific deviations from
translation-invariance), and demonstrate that adding the Fourier
representations to a deep Q network aids policy learning in continuous control
tasks. We show the equivalence between the generalised prediction framework and
traditional models of grid cell firing driven by self-motion to perform path
integration, either using oscillatory interference (via Fourier components as
velocity-controlled oscillators) or continuous attractor networks (via analysis
of the update dynamics). We thus provide a unifying framework for the role of
the grid system in predictive planning, sense of direction and path
integration: supporting generalisable inference over directed actions across
different tasks.Comment: In Proceedings of ICLR 202
Structured recognition for generative models with explaining away
A key goal of unsupervised learning is to go beyond density estimation and sample generation to reveal the structure inherent within observed data. Such structure can be expressed in the pattern of interactions between explanatory latent variables captured through a probabilistic graphical model. Although the learning of structured graphical models has a long history, much recent work in unsupervised modelling has instead emphasised flexible deep-network-based generation, either transforming independent latent generators to model complex data or assuming that distinct observed variables are derived from different latent nodes. Here, we extend amortised variational inference to incorporate structured factors over multiple variables, able to capture the observation-induced posterior dependence between latents that results from “explaining away” and thus allow complex observations to depend on multiple nodes of a structured graph. We show that appropriately parametrised factors can be combined efficiently with variational message passing in rich graphical structures. We instantiate the framework in nonlinear Gaussian Process Factor Analysis, evaluating the structured recognition framework using synthetic data from known generative processes. We fit the GPFA model to high-dimensional neural spike data from the hippocampus of freely moving rodents, where the model successfully identifies latent signals that correlate with behavioural covariates
Unsupervised representation learning with recognition-parametrised probabilistic models
We introduce a new approach to probabilistic
unsupervised learning based on the recognitionparametrised model (RPM): a normalised semiparametric hypothesis class for joint distributions
over observed and latent variables. Under the key
assumption that observations are conditionally
independent given latents, the RPM combines
parametric prior and observation-conditioned latent distributions with non-parametric observation marginals. This approach leads to a flexible
learnt recognition model capturing latent dependence between observations, without the need for
an explicit, parametric generative model. The
RPM admits exact maximum-likelihood learning for discrete latents, even for powerful neuralnetwork-based recognition. We develop effective approximations applicable in the continuouslatent case. Experiments demonstrate the effectiveness of the RPM on high-dimensional data,
learning image classification from weak indirect
supervision; direct image-level latent Dirichlet
allocation; and recognition-parametrised Gaussian process factor analysis (RP-GPFA) applied
to multi-factorial spatiotemporal datasets. The
RPM provides a powerful framework to discover
meaningful latent structure underlying observational data, a function critical to both animal and
artificial intelligence
Unsupervised representation learning with recognition-parametrised probabilistic models
We introduce a new approach to probabilistic unsupervised learning based on
the recognition-parametrised model (RPM): a normalised semi-parametric
hypothesis class for joint distributions over observed and latent variables.
Under the key assumption that observations are conditionally independent given
latents, the RPM combines parametric prior and observation-conditioned latent
distributions with non-parametric observation marginals. This approach leads to
a flexible learnt recognition model capturing latent dependence between
observations, without the need for an explicit, parametric generative model.
The RPM admits exact maximum-likelihood learning for discrete latents, even for
powerful neural-network-based recognition. We develop effective approximations
applicable in the continuous-latent case. Experiments demonstrate the
effectiveness of the RPM on high-dimensional data, learning image
classification from weak indirect supervision; direct image-level latent
Dirichlet allocation; and recognition-parametrised Gaussian process factor
analysis (RP-GPFA) applied to multi-factorial spatiotemporal datasets. The RPM
provides a powerful framework to discover meaningful latent structure
underlying observational data, a function critical to both animal and
artificial intelligence
Successor-Predecessor Intrinsic Exploration
Exploration is essential in reinforcement learning, particularly in environments where external rewards are sparse. Here we focus on exploration with intrinsic rewards, where the agent transiently augments the external rewards with self-generated intrinsic rewards. Although the study of intrinsic rewards has a long history, existing methods focus on composing the intrinsic reward based on measures of future prospects of states, ignoring the information contained in the retrospective structure of transition sequences. Here we argue that the agent can utilise retrospective information to generate explorative behaviour with structure-awareness, facilitating efficient exploration based on global instead of local information. We propose Successor-Predecessor Intrinsic Exploration (SPIE), an exploration algorithm based on a novel intrinsic reward combining prospective and retrospective information. We show that SPIE yields more efficient and ethologically plausible exploratory behaviour in environments with sparse rewards and bottleneck states than competing methods. We also implement SPIE in deep reinforcement learning agents, and show that the resulting agent achieves stronger empirical performance than existing methods on sparse-reward Atari games
- …